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1. INTRODUCTION 

The last couple decades have seen major advancements in biometric identification techniques. 
Additionally, a variety of biometric traits have been used for identification and verification, including the 
face, iris, fingerprint, palm print, and others [1]-[4]. Biometric recognition systems may presently attain 
extremely high levels of accuracy when tested against biometric datasets that are readily available. The 
efficiency of each biometric system is nonetheless constrained by the intrinsic characteristics of biometric 
traits and the limits of detecting technology. Multimodal biometric fusion has therefore lately caught the 
interest of several academics [5], [6]. Combining two or more biometric qualities from several people is an 
efficient way to overcome some of the limitations of using a single biometric system. This might improve 
overall matching accuracy and strengthen the security of biometric systems. There are several ways to 
research biometric fusion, one of which makes use of heterogeneous datasets [7], [8], that integrating 
biometric characteristics (such a fingerprint from a separate database and a signature from another. In the 
experiment, biometric characteristics from many people are combined to produce a "chimeric user." 
Although this approach is frequently used in multimodal research, Poh and Bengio [9] found that the 
performance assessed in trials with chimera users may not precisely mimic the performance of genuine multi- 
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modal users. The most effective method for researching biometric fusion is to use homologous databases of 
multimodal biometrics, where the different biometric traits are really collected from the same person. In this article, 
a new homologous multimodal database including biometric characteristics for the hand, face, and iris is presented. 
Additionally, it advises using one of the gathered attributes as a case study for deep convolutional neural networks 
(CNNs) to detect faces. It is advisable to avoid the use of heterogeneous databases in the context of multimodal 
biometrics, as correlating the data can be problematic. However, creating homologous multimodal databases poses 
significant challenges. This is because it usually takes longer to acquire the necessary data, which can cause 
subjects to respond negative to acquisition sessions of extended duration. Additionally, the database size and the 
cost of acquiring the data are considerably higher. Such a profession is typically significantly more challenging to 
manage. The development of real multimodal databases with a wide variety of biometric features and multiple 
users has, fortunately, been the focus of recent research. These databases are now accessible, but they have several 
drawbacks, such as a dearth of critical qualities or a lack of variety in sensors and attributes. Moreover, they 
considered limited due to the complexity and challenges associated with collecting multimodal biometric databases 
stem from technical, ethical, privacy, and logistical considerations. Overcoming these challenges requires careful 
planning, collaboration, and adherence to legal and ethical frameworks to ensure the integrity and usability of the 
collected data. However, there are a few available multimodal biometric databases for example: XM2VTS [10] 
database combines face and speaker modalities, providing synchronized video and speech recordings. This 
valuable resource offers researchers the opportunity to investigate the fusion of facial and speech cues. However, 
one limitation of the M2VTS database is its size and diversity, which could be expanded to accommodate a 
broader range of subjects, lighting conditions, and other variations. BANCA [11], multimodal biometric database 
has been widely used in research and development in the field of biometrics. However, it has faced criticism for 
certain limitations. One criticism is the relatively small sample size, which may limit the generalizability of 
research findings. Additionally, the database primarily focuses on a few modalities, such as face and voice, 
potentially overlooking the evaluation of other important biometric modalities. Despite these criticisms, the 
BANCA database still offers valuable data for studying multimodal biometric systems and their performance in 
access control scenarios. Moreover, in comparison to other databases, the MYCT database [12] is relatively 
simpler and predominantly focused on utilizing fingerprints and signatures as biometric modalities. This limitation 
hinders its effectiveness by excluding other types of biometric data. 

Oppositely, the DMCsv1 [13] multimodal biometric database, containing 3D face and hand scans, 
provides researchers and developers in the biometrics field with a valuable resource for research and development. 
However, it is important to acknowledge certain limitations. One possible criticism is the database's relatively 
small size, which can affect the generalizability and statistical robustness of research findings. Additionally, the 
focus on 3D face and hand scans, which are more complex compared to other types of biometric databases, should 
be taken into consideration when utilizing the DMCsv1 database for biometric research. There are other databases 
with more than two biometric characteristics, such as BIOMET [14], which includes a person's hand, voice, 
fingerprint, and signature, and BioSec [15], which includes a person's face and eye movements. It also has 
multimodal biometric databases for voice, iris, face, and fingerprints, as seen below. Multimodal biometric 
databases, however, can come with a number of difficulties. The requirement to create algorithms that can 
successfully combine data from several modalities is one of the major issues. This can be a challenging 
undertaking since different modalities may have varying error rates and necessitate using various processing 
strategies. As not all users may be able to supply data for all modalities, it is necessary to design ways for coping 
with missing or incomplete data. Multimodal biometric databases are still an important field of study despite these 
difficulties. Several of these databases are shown in Table 1 along with their properties. The face, hand, and iris 
attributes for the same person were included in our multimodal biometric database (MULBv1); these traits were 
not included simultaneously in other databases. 


Table 1. Multimodal biometric database 
No. of No. of No. of 


Ref. Year Database name ; : Name of traits 
users session traits 
[10 1999 XM2VTS 295 4 2 Voice, 2D face 
HL 2003 BANCA 202 12 2, 2D face, voice 
[12 2003 MYCT 330 1 2 Fingerprint, signature 
[13 2005 MyIDEA 104 3 6 Voice, face, signature, fingerprints, hand geometry, handwriting 
[14 2006 M3 32 3 3 Voice, 2D face, fingerprint 
[15 2007 BioSec 250 4 4 Voice, 2D face, fingerprint, iris 
[16 2008 TV? 300 1 3 Iris, 2D and 3D face 
[17] 2011 SDUMLA-HMT 106 - 5 Gait, iris, finger vein, 2D face 
[18 2012  BIOMENT 91 3 2 2D face, fingerprint 
[19 2015 DMCSv1 35 2 2 Hand, 3D face 
[20 2017 BioSoft 75 1 7 2D face, ear, handwriting, iris, voice, soft biometrics, fingerprint 


Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 677-685 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 Oo 679 


The remainder of this paper is structured as follows: the properties of the MULBv1 database are 
fully explained in section 2. The case study for face recognition using a deep convolution neural network is 
shown in section 3 utilizing the face sub-database for the MULBv1 database. Section 4 displays the outcomes 
of the experiment. Section 5 has the conclusion. 


2. MULBV1 MULTIMODAL DATABASE 

The MULBv1 database was put together at Al-Furat Al-Awsat Technical University in Kufa, Iraq, 
during the winter of 2023. A total of 174 people, comprising 116 men and 58 women, between the ages of 17 
and 54, took part in the data gathering procedure. Each participant had their face, hand, and iris biometric 
features gathered, resulting in the creation of three sub-databases in MULBv1. It is crucial to remember that 
each person ID corresponds to a set of biometric features that were all obtained from the same person for 
each sub-database. Subsections will offer further details on each of the three sub-databases. 


2.1. Database of face 

A highly developed biometrics technique is facial trait recognition. Many studies focus on it [21]. A 
face database created exclusively for in-person face recognition is included in the MULBv1. The faces were 
photographed in a variety of situations, including diverse positions, emotions, and the inclusion of 
accessories like hats and spectacles. Environmental elements like lighting and background noise were left 
unrestricted to provide a genuine experience. For each person in the face database, there are 20 jpg image 
files with varying file sizes. The total size of the face database is 7.97 GB. 


2.2. Database of hand 

The human hand has enough anatomical characteristics to allow for personal identification. The 
hand database in MULBv1 consists of 20 right hand images from various perspectives, some of which have a 
ring for each person. The hand database is made up of different-sized jpg images files. The overall size of the 
entire database is 10.3 GB. Sample images for the hand database are shown in Figure 1. 


Figure 1. Sample images from hand database with and without accessory 


2.3. Database of iris 

Iris recognition research has significantly increased during the past few years. Statistical analysis 
performed in found [22], iris possesses the most reliable and constant features of all biological qualities. 
Consequently, some recent study employing the iris trait [23], [24]. As a result, we provide an iris database in 
MULBv1. The iris database includes 20 right Iris images for each person under different lighting conditions. 
The iPhone 14 Pro Max's micro camera was used to take pictures of the iris while maintaining a 2 to 5 cm 
gap between the device and the subject's eye. The sizes of the images, which were saved in the jpg format, 
varied. The overall size of the iris database is 1.30 GB. Sample images for the iris database are shown in 
Figure 2. 
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Figure 2. Images from the iris database as examples 


3. FACIAL RECOGNITION BASED ON DEEP CONVOLUTION NEURAL NETWORK: CASE 
STUDY 
Deep learning is often used for facial identification utilizing CNNs. Since CNNs are built to 
interpret pictures by detecting characteristics in numerous layers, they are highly accurate at identifying 
patterns in images. Based on the special face dataset from MULBv1, a case study for face recognition is 
utilized in this section. CNNs are used to create a model that is very accurate in recognizing faces. A CNN is 
a mathematical framework typically composed of three layers [25], [26]: 
a. Convolutional layer: this layer extracts features from the input data to produce a feature map by applying 
filters to the input data and carrying out element-wise multiplications and adds. 
b. Pooling layer: this layer's function is to reduce the spatial dimensionality of the output by down sampling 
the feature maps using techniques like max-pooling or average-pooling. 
c. Fully connected (FC) layer: enables the network to learn about nonlinear feature combinations by linking 
every neuron in the previous layer to every neuron in the following layer. 


3.1. Facial recognition model 

Figure 3 illustrates the three key steps of the proposed CNN model there are: preprocessing, CNN- 
based feature extraction, and Softmax-based classification. 

a. Pre-processing stage: 

The following list summarizes the various techniques used on the original images: i) face detection; 
ii) face cropping; and iii) face resizing to 200x200. 

b. Feature extraction and classification: 

The CNN model used in the study consists of eleven layers: three convolutional layers, three max- 
pooling layers, one flatten layer, two FC layers, one dropout layer, and finally one output layer. The 
explanations that follow are for each layer in the CNN that we created: 

— The first layer is convolution, and each convolution layer is followed by an activation function rectifier 
linear unit (ReLU). The image size remains 200 by 200 pixels. By using the max-pooling layer, the 
feature picture is scaled down to 100x100 pixels. 

— The third layer, which is likewise a convolution layer and has the same output size as the second 
convolution layer, is added after the second convolution layer. A max-pooling layer is added after that to 
provide an output with a 50x50 size. 

— The max-pooling layer, the following layer, is still a convolution layer and generates an image with a 
50x50 pixel size. The output of the max-pooling layer is 25x25. 

— The flatten layer, which converts the feature map into a vector, is the seventh layer. 

— The eighth layer then uses a FC layer, which changes its number of units dependent on the preceding 
layer as well as the required number of categories. 

— The dropout layer, which is the ninth layer, is used to lessen network complexity and minimize 
overfitting. 
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The Softmax classifier, which is typically employed for many classification tasks, is utilized by the tenth 
layer, the Softmax layer, to identify members. This layer is expanding with new classes. It is employed at 
the network's top level, where its non-linear classification skills are excellent. The recommended CNN 


architecture is listed in Table 2. 
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Figure 3. CNN architecture 


Table 2. A brief summary of the proposed CNN architecture 


Kind of layer Filters size Num. of filters Form of input Form of output Num. of parameters 
1-Conv2D 3*3 64 200, 200, 1 200, 200, 64 640 
1-MaxPooling 2*2 - 200, 200, 64 100, 100, 64 0 
2-Conv2D 3*3 32 100, 100, 64 100, 100, 32 18496 
2-MaxPooling 2*2 - 100, 100, 32 50, 50, 32 0 
3-Conv2D 3*3 32 50, 50, 32 50, 50, 32 9248 
3-MaxPooling 2*2 - 50, 50, 32 25,25,32 0 
Flatten - - - 20000 0 
FCI - - - 512 10240512 
Dropout Rate:0.5 - - - 512 0 
FC2 - - - 174 89262 


Total num. of parameters: 10, 358, 126 
Trainable parameters: 10, 358, 126 


There are some pros and cons of the proposed approach: 


i. 


ii. 


iii. 


i. 


ii. 


Pros: 

Enhanced security: face recognition is a powerful security tool that can be used to reliably identify 
people. It may be used in a variety of settings, including opening cellphones, entering restricted 
locations, and confirming identities at border crossings or airports, possibly lowering fraud and unlawful 
access. 
Convenience and efficiency: by doing away with the need for physical identity cards, passwords, or 
PINs, face recognition enables convenience. Processes like access control, identification verification, and 
attendance monitoring may be streamlined, saving time and easing administrative responsibilities. 
Surveillance and public safety: by detecting those involved in criminal activity or helping to find the 
missing, facial recognition technology can support efforts to improve public safety. In order to improve 
security in public areas and support law enforcement authorities' investigations, it can be combined with 
surveillance cameras. 

Cons: 
Privacy concerns: due to the gathering and storage of extremely sensitive and individual biometric data, 
the use of face recognition presents privacy issues. This data may be misused or handled improperly, 
which might result in privacy violations. 
False positives and negatives: face recognition software isn't always accurate; it occasionally results in 
false positives (mapping an individual wrongly) or false negatives (failing to recognize a known 
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individual). Inconvenience, denying access to authorized users, or security vulnerabilities, if exploited, 
can result from these mistakes. 

Potential misuse and surveillance concerns: face recognition software may be used dishonestly or 
maliciously for things like mass monitoring, tracking people without their permission, or restricting 
personal liberties. Without sufficient protections and controls, the widespread use of facial recognition 
technologies may have negative societal effects. 


iii. 


i= 


4. EXPERIMENTAL RESULTS 

3480 frontal-face images with a bit depth of 24 are included in the MLUBv1 and are divided among 
174 participants. Images in (.jpg) format type were taken under various bright lighting situations using a 
range of expressions, positions, and accessories. We divided it into 70% (2437 images). for training 
(2437 images), and 30% for testing (1043 images). 


4.1. Evaluation metrics 
To evaluate the model, this study uses a variety of metrics, including accuracy, loss function, F1 
score, precision, and recall. 
a. Accuracy: is one of the assessment metrics that is most frequently used for issue identification and 
categorization. It shows the proportion of predictions that came true overall. In (1) displays the meaning [27]: 


Acc = 


(TpostT Neg) (1) 
(Tpos+T Negt+F PostF Neg) 


b. Loss function: is used in machine learning to measure the difference between a model's predicted output 
and the actual output. The reduction of the loss function, which raises the forecasting accuracy of the 
model, is the ultimate goal of machine learning model training. For classification problems, the cross- 
entropy loss function is a well-liked option. It determines the discrepancy between the predicted 
probability distribution and the actual probability distribution of the labels. When creating the model, add 
"categorical crossentropy" to make cross-entropy the Keras loss function [28]. 

c. Precision: this determines the proportion of correctly produced positive predictions to all of the positive 
predictions, as determined by (2) [27]: 


Pre = a AEP Os = (2) 
(Tpost+F Pos) 
d. Recall: this determines the percentage of correct positive predictions among all of the actual positive 
occurrences. In (3) defines this [27]: 


Rec = — Pos (3) 


(Tpost+F Neg) 
e. The F1 score: a weighted harmonic average of recall and precision, calculated (4) [29]: 


(Pre*Rec) 


F1 score =2* 
(Pre+Rec) 


(4) 

Where: 

— True positive (Tpos): the model correctly predicts the instance that belongs to the positive class and 
assigns it a positive label. 

— True negative (Tneg): the model correctly recognizes the instance as belonging to the negative class and 
assigns it a negative label. 

— False positive (Fpos): the model correctly recognizes and assigns a negative label to the instance that really 
belongs to the negative class. 

— False negative (Fneg): a positive class instance is given a negative label by the model, which is an 
inaccurate prediction. 


4.2. Assessment of the suggested model by measuring its accuracy, loss function, and F1 score 

A collection of 2437 face images were used for training the "MULBv1" multimodal database, while 
a different set of 1043 images were utilized for testing. The Keras deep learning framework's classification 
function, Softmax, was utilized to train the network. After a number of training rounds, the testing model's 
highest accuracy rate was discovered to be 97.41%, and the loss function was 0.2799. The filter size for each 
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convolutional layer was 3x3, and the window size was 2x2. The convolution layers contained 64, 32, and 32 
kernels in that order. After multiple tries, the ideal CNN design was discovered, and the recommended 
network was eventually selected because to its great accuracy. Up until the best performing model was 
produced on the "MULBv1" dataset, the model was continuously improved by modifying the max pooling, 
kernel count, and number of convolutions. Sample efforts are presented in Table 3. Additionally, Figure 4 
shows the relationship between the recommended model's accuracy and the quantity of training rounds. 


Table 3. Samples of attempts for build model until the best performing model was achieved on "MULBv1" 


Max_pooling2D Mupior Activation Loss Fl 
hidden À Accuracy ane 
(1, 2, 3) layers layers function function score 


1_Conv2D 2_Conv2D 3_Conv2D 


Attemp1 

No. & 16, (5*5) 32, (5*5) 64, (3*3) 2*2 256 RELU 0.9626 0.3575 0.9621 
size of 

filters 

Attemp2 

No. & 32, (3*3) 64, (3*3) 64, (3*3) 2*2 512 Leaky 0.9674 0.3475 0.9671 
size of ReLU 

filters 

Attemp3 

No. & 64, (3*3) 32, (3*3) 32, (3*3) 2*2 512 RELU 0.9703 0.2842 0.9693 
size of 

filters 

Attemp4 

No. & 256, (3*3) 64, (3*3) 32, (3*3) 2*2 128 RELU 0.9713 0.2049 0.9708 
size of 

filters 

Attemp5 

No. & 64, (3*3) 32, (3*3) 32, (3*3) 2*2 512 ELU 0.9741 0.2799 0.9736 
size of 

filters 


From Table 3 noticed when using 64 filters with size 3x3 in conv_1, 32 filters with size 3x3 in 
conv_2, and 32 fillers with size 3x3 in conv_3. Max-pooling with size 2x2, number of hidden layers is 512 
and activation function is ELU, we obtain high accuracy and pest performance. Figure 4 shows a sharp rise in 
accuracy at the beginning, then a steady rise from epoch 10 onward, stabilizing around epoch 40. Figure 5 
also shows how the proposed model's loss function (categorical cross entropy) is affected by how many 
repetitions were completed during the training phase. 


training and testing loss 


training and testing accuracy 
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2 
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Figure 4. The accuracy of the suggested model Figure 5. The suggested model's loss function 


The loss function in Figure 5 has a noticeable dip at the beginning. The ensuing steady decline of the 
indicator starts at epoch 10, and the commencement of the loss function's general stabilization occurs at 
epoch 40. Figure 6 displays the results of measuring accuracy using various numbers of size images. That 
said, we found great accuracy when the image size was 200x200. 
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Figure 6. The relation between accuracy and size of images 


5. CONCLUSION 

Multimodal biometric security solutions greatly lessen the issues brought on by unimodal, while 
being more accurate than relying on a single biometric trait. There is always a possibility that hackers would 
steal the data from unimodal biometric databases. However, the lack of significant public multimodal 
datasets created under actual working conditions is one of the primary obstacles to creating, testing, and 
evaluating biometric recognition systems. The authors of this paper presented a first version of multimodal 
biometric database that is based on homologous characteristics, they named it the MULBv1! database and 
includes three distinct biometric traits: iris, hand and face traits for 174 individuals. A case study for facial 
identification using deep CNNs has also been reported that uses face attributes to assess one of the gathered 
biometrics. The results of the case study demonstrate the effectiveness of one of the collected biometric 
characteristics. An important area of biometric recognition research, diverse biometric fusions, will benefit 
greatly from the proposed database. An updated database will be created as the following work, and the 
database will soon be available on Kaggle for mostly research-related uses. An important area of biometric 
recognition research, diverse biometric fusions, will benefit greatly from the proposed database. An updated 
database will be created as the following work, and the database will soon be available on Kaggle for mostly 
research-related uses. 
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